Implementing fixed-point polynomials in hardware logic

ABSTRACT

A method implements fixed-point polynomials in hardware logic. In an embodiment the method comprises distributing a defined error bound for the whole polynomial between operators in a data-flow graph for the polynomial and optimizing each operator to satisfy the part of the error bound allocated to that operator. The distribution of errors between operators is updated in an iterative process until a stop condition (such as a maximum number of iterations) is reached.

BACKGROUND

As is known to those of skill in the art, a polynomial is a mathematicalexpression comprising one or more terms, each of which consists of aconstant multiplied by one or more variables raised to a non-negativeinteger exponent (e.g. a+bx+cx², where a, b and c are the constants andx is the variable).

Polynomials are very common as they can be used to calculate a varietyof values and/or model certain behavior. For example, a point (a₁, a₂,a₃) 102 is determined to be on one side of a triangle 104 defined bythree points (0, 0, 0), (b₁, b₂, b₃), and (c₁, c₂, c₃) as shown in FIG.1a if equation (1) below is true:a ₁ b ₂ c ₃ −a ₁ b ₃ c ₂ −a ₂ b ₁ c ₃ +a ₂ b ₃ c ₁ +a ₃ b ₁ c ₂ −a ₃ b ₂c ₁≥0  (1)

In another example, a line passing through the point (0,0,0) and (v₁,v₂, v₃) 106 is determined to pass through a sphere 108 defined by aradius r and centre (c₁, c₂, c₃), as shown in FIG. 1b , if equation (2)is true:(v ₁ c ₁ +v ₂ c ₂ +v ₃ c ₃)²−(v ₁ ² +v ₂ ² +v ₃ ²)(c ₂ ¹ +c ₂ ² +c ₃ ²−r ²)≥0   (2)

In further examples, polynomials may be used to implement filters,perform memory address calculations, perform binary to decimalconversion, etc. A variety of number and graphics formats havepolynomial interpretations (e.g. floating-point formats, UNORM and SNORMgraphics formats) and certain logical operations (e.g. inversion, XORingeach bit of a with bit s) are also polynomial.

When a polynomial is evaluated in hardware it may be evaluated usingfixed-point or floating point number representations. As is known tothose skilled in the art a fixed-point number representation is arepresentation of a number that has a fixed number of digits after theradix point (e.g. decimal point or binary point). In contrast, afloating point number representation of a number is a representation ofa number where the radix point is not fixed (i.e. it can “float”). Inother words the radix point can be placed anywhere within therepresentation.

The embodiments described below are not limited to implementations whichsolve any or all of the disadvantages of known systems and method forimplementing fixed-point polynomials.

SUMMARY

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used as an aid in determining the scope of the claimed subjectmatter.

Methods of implementing fixed-point polynomials in hardware logic aredescribed. In an example embodiment the method comprises distributing adefined error bound for the whole polynomial between operators in adata-flow graph for the polynomial and optimizing each operator tosatisfy the part of the error bound allocated to that operator. Thedistribution of errors between operators is updated in an iterativeprocess until a stop condition (such as a maximum number of iterations)is reached.

An aspect provides an example method of generating an implementation ofa fixed-point polynomial with rational coefficients in hardware logicthat satisfies a defined error bound. The method receives the polynomialas a data-flow graph comprising one or more nodes. A node is either asum-of-products operator or a constant division operator. An error boundis computed for each node in the data-flow graph based on the definederror bound for the polynomial. The method then optimizes a singlesummation array for each node using a truncation scheme and the errorbound for the node. An updated error bound is calculated for each nodein the data-flow graph and further optimizing is performed on the singlesummation array for each node using the truncation scheme and theupdated error bound for the node. The calculating of updated errorbounds redistributes error bounds between nodes to allow for synthesisof the fixed-point polynomial with a reduced physical size whilstsatisfying the defined error bound.

A second aspect provides a computer readable storage medium havingencoded thereon computer readable program code for performing the methodas described above.

A third aspect provides an apparatus configured to perform lossysynthesis of a fixed-point polynomial with rational coefficients andgenerate an implementation of the fixed-point polynomial in RTL, theapparatus comprising: a processor; and a memory comprising computerexecutable instructions which, when executed, cause the processor to:receive the polynomial as a data-flow graph comprising one or morenodes, wherein a node is either a sum-of-products operator or a constantdivision operator; compute an error bound for each node in the data-flowgraph based on the defined error bound for the polynomial; optimize asingle summation array for each node using a truncation scheme and theerror bound for the node; calculate an updated error bound for each nodein the data-flow graph; and further optimize the single summation arrayfor each node using the truncation scheme and the updated error boundfor the node. The calculating of updated error bounds redistributeserror bounds between nodes to allow for synthesis of the fixed-pointpolynomial with a reduced physical size whilst satisfying the definederror bound.

A fourth aspect provides an apparatus configured to compare two or morearray summations when implemented in hardware logic, the apparatuscomprising: a processor; and a memory comprising computer executableinstructions which, when executed, cause the processor to: receive RTLdata describing the two or more array summations; and calculating, foreach array summation, a cost function using:

${{area}\left( {{array}\left( {h,w,{pp}} \right)} \right)} \approx {{7{{pp}\left( {1 - \left( \frac{2}{3} \right)^{f}} \right)}} + {\frac{w}{2}\left( {{3\left\lceil {\log_{2}w} \right\rceil} + 11} \right)}}$${{where}\mspace{14mu} f} = \left\lceil \frac{{\log_{2}h} - 1}{{\log_{2}3} - 1} \right\rceil$where w is a width of the array, h is a height of the array and pp is anumber of bits in the array. The cost functions calculated for eacharray summation provide for several candidate RTL implementations to becompared and an RTL implementation with an optimal cost function to beselected for implementation in hardware.

The methods described herein may be performed by a computer configuredwith software in machine readable form stored on a tangible storagemedium e.g. in the form of a computer program comprising computerreadable program code for configuring a computer to perform theconstituent portions of described methods or in the form of a computerprogram comprising computer program code means adapted to perform allthe steps of any of the methods described herein when the program is runon a computer and where the computer program may be embodied on acomputer readable storage medium. Examples of tangible (ornon-transitory) storage media include disks, thumb drives, memory cardsetc. and do not include propagated signals. The software can be suitablefor execution on a parallel processor or a serial processor such thatthe method steps may be carried out in any suitable order, orsimultaneously.

The hardware components described herein may be generated by anon-transitory computer readable storage medium having encoded thereoncomputer readable program code.

This acknowledges that firmware and software can be separately used andvaluable. It is intended to encompass software, which runs on orcontrols “dumb” or standard hardware, to carry out the desiredfunctions. It is also intended to encompass software which “describes”or defines the configuration of hardware, such as HDL (hardwaredescription language) software, as is used for designing silicon chips,or for configuring universal programmable chips, to carry out desiredfunctions.

The preferred features may be combined as appropriate, as would beapparent to a skilled person, and may be combined with any of theaspects of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will be described, by way of example, withreference to the following drawings, in which:

FIG. 1a is a schematic diagram of a triangle defined by threethree-dimensional points;

FIG. 1b is a schematic diagram of a circle defined by a radius and athree-dimensional centre point;

FIG. 2 is an example schematic diagram of a lossy synthesizer;

FIG. 3 is an example flow diagram of an example method of operation ofthe lossy synthesizer shown in FIG. 2;

FIG. 4 shows an example data-flow graph;

FIG. 5 shows an example summation array for a sum-of-products operator;

FIG. 6 shows examples of array truncation schemes;

FIG. 7 is a flow diagram of an example truncation scheme which isreferred to herein as ‘FRator’;

FIG. 8 is a graphical representation of the truncation scheme of FIG. 7;

FIG. 9 shows three more example data-flow graphs;

FIG. 10 is a flow diagram of another example method of operation of thelossy synthesizer shown in FIG. 2; and

FIG. 11 is a schematic diagram of an example computing device which maybe configured to implement the methods described herein.

Common reference numerals are used throughout the figures to indicatesimilar features.

DETAILED DESCRIPTION

Embodiments of the present invention are described below by way ofexample only. These examples represent various ways of putting theinvention into practice that are currently known to the Applicantalthough they are not the only ways in which this could be achieved. Thedescription sets forth the functions of the examples and the sequence ofsteps for constructing and operating the examples. However, the same orequivalent functions and sequences may be accomplished by differentexamples.

Described herein are methods for implementing fixed-point polynomials inhardware logic which reduce resource usage (e.g. area of hardware logicthat is required) whilst providing a guaranteed bounded error. A methodof comparing the efficiency of multiple candidate architectures is alsodescribed. In the methods described, the polynomials are not evaluatedto infinite precision but instead the methods enable exploitation of auser defined maximum absolute error and so may be described as lossy. Asdescribed in detail below, the methods described herein distribute theuser defined maximum absolute error (which may also be referred to asthe allowable error) between the operators in a data-flow graph (DFG)which represents the polynomial to reduce/minimize implementation costwhilst guaranteeing that the user defined maximum absolute error is notexceeded. The methods generate a hardware logic design at a registertransfer level (RTL). Logic synthesis may then be used to transitionfrom RTL to gate level to produce a gate level description which canthen be laid out and implemented in silicon. In contrast to logicsynthesis, the process of interpreting a system level model to produce aRTL description is referred to as high-level synthesis (HLS) and themethods described herein may be implemented in a synthesizer.

As described above, lossy synthesis is a method of synthesis (e.g. HLS)where the result is not bit identical to the input. In the methodsdescribed herein, the user specifies and controls the error and thelossy synthesizer exploits the error freedom to optimize hardware area,speed and/or power. Exploiting the error freedom comprises adjustingparticular parameters within the circuits to be created. There are anumber of approaches to lossy synthesis which have been proposed,including word-length optimization, imprecise operators and gate levelimprecision. Word-level optimization involves choosing the precision forall internal variables whilst maintaining acceptable global (i.e.overall) error. The word-length (i.e. number of bits used to representan internal variable) is dependent upon the range and precision of thedata the variable must hold, as well as the number format. Choosing thebest precision for all internal variables has, however, been shown to beNP-hard in general. Imprecise operators are operators (such as adders)which do not necessarily return the correct answer, but return an answerwhich is correct within a defined accuracy (e.g. rounded using arounding scheme). Various imprecise adder and multiplier architectureshave been proposed and some are described below; however determining howto use these imprecise components to optimize certain hardwareproperties while maintaining an acceptable level of total output errorfor an entire polynomial is complex. Gate level imprecision involvesremoving logic gates from a circuit whilst still maintaining some levelof quality of the result.

The methods described herein are implemented in a lossy synthesizer 200of the form shown in FIG. 2. This lossy synthesizer 200 takes as input202 a polynomial with rational coefficients and fixed-point inputs andoutputs. An error specification 204 is also provided which provides auser bounded error (i.e. a maximum absolute error tolerance). The output206 of the synthesizer 200 is RTL that is guaranteed to meet the errorspecification 204 and is suitable for datapath logic synthesis. Atypical user requirement is that the result should be correct to theunit in the last place and this can be translated into an absolute errorbound.

FIG. 3 is a flow diagram of an example method of operation of the lossysynthesizer 200. As described above, an input 202 to the lossysynthesizer is a fixed-point polynomial with rational coefficients, p,for example:

$p = \frac{a + {2b} + {2c} + d}{6}$Without loss of generality, the polynomial may be written as apolynomial with integer constants and rational coefficients, for exampleof the following form for an appropriate value of n:

$p = \frac{A + {2B} + {2C} + D}{6*2^{n}}$

As shown in FIG. 3, the polynomial, p, that is input to the lossysynthesizer 200, is represented in FIG. 4 as a DFG 400 with nodes thatare either arithmetic sum-of-products (SOPs) 402 or constant divisionoperators 404 (block 302). In the following description the terms ‘node’and ‘operator’ are used interchangeably when referring to the DFG.Assuming that the input polynomial is written in a form that has integerinputs, then there are a myriad of ways that the polynomial can bewritten in terms of integer SOPs and constant divisions. Moreover, suchan expression can always be created (e.g. x³ may instead be written asx×x²). The DFG which is received by the lossy synthesizer (in block 302)may comprise operators which are all SOP operators or all constantdivision operators or may comprise a combination of one or more SOPoperators and one or more constant division operators.

An SOP has the form:AB±CD±E . . .and includes multipliers, squarers (e.g. x²), constant multipliers andsets of additions. A constant division has the form:x/dwhere d is a constant. It has been shown that this division can bereplaced by a single multiply-add instruction:x/d≈[(ax+b)/2^(k)]for suitable values of a, b and k.

Having received the polynomial in the form of a DFG (in block 302), thetask of the lossy synthesizer is to determine how to best distribute theallowed error 204 amongst the nodes (i.e. amongst the SOPs 402 andconstant division operators 404). Each of the constant divisionoperators 404 will introduce an error and each of the SOPs 402 mayintroduce an error and it is the cumulative effect of these individualoperator errors which must not exceed the specified allowed error 204.In the method described herein, the goal is to remove the greatestnumber of partial product bits as ultimately the number andconfiguration of the partial products determines the characteristics ofthe cost of implementation of the polynomial. This is implemented in aniterative process (as shown in FIG. 3 by the arrow from block 312 toblock 306) which starts by allocating an initial error to each node inthe DFG (block 304) and then iteratively improves the distribution oferrors between nodes (in blocks 306 and 310). The iterative loop(comprising blocks 306 and 310) terminates either when the maximumiteration count (or maximum runtime) is reached (‘Yes’ in block 308,e.g. at 100 iterations) or when the newly calculated updated errors(calculated in block 310) are the same as the previously used errors foreach node (‘Yes’ in block 312). The maximum iteration count (oralternatively a maximum runtime which may be used instead in block 308)may be user specified and in some examples may not be used (e.g. block308 may be omitted such that block 306 is followed by block 310). Anexample of way in which initial errors are computed (in block 304) andoptimized (in block 306) is described in detail below for both a generalcase and for polynomials with specific characteristics (e.g. whichenable simplification of the method used).

If e_(i) is the maximum absolute error permitted in the ith operator(i.e. the ith node in the DFG) and R_(i)(e) is the number of partialproduct bits removed in operator i due to exploiting the error freedomin all operators with bounds defined by the elements of e (where e isthe vector of elements e_(i)), the total number of partial product bitsremoved (in block 306) is:

$\sum\limits_{i}{R_{i\;}(e)}$

If {circumflex over (p)}(x, ε) is the actual value of the implementationof polynomial p(x) which introduces an actual error ε_(i) at node i (so|ε_(i)|≤e_(i)), then the absolute error requirement is, for a userdefined bound (i.e. maximum absolute error tolerance) η:|p(x)−{circumflex over (p)}(x,ε)|≤η

Expanding the left hand side with respect to s means that for somecoefficients c(x) this condition can be written as follows:

${{\sum\limits_{\alpha}{{c_{\alpha}(x)}ɛ^{\alpha}}}} \leq \eta$where c_(α)(x) are particular polynomials in x and ε^(α)=ε₁ ^(α) ¹ ε₂^(α) ² . . . and so the summation is performed over the various valuesof vector elements of α.

A bound on the left hand side is:

${{\sum\limits_{\alpha}{{c_{\alpha}(x)}ɛ^{\alpha}}}} \leq {\sum\limits_{\alpha}{\left( {\max\limits_{x}{{c_{\alpha}(x)}}} \right)e^{\alpha}}}$where e is the vector of allocated error limits for each node. Forexample, if:p(x)=x ² y ²andp(x,e)=(x ² +e ₁)(y ² +e ₂)+e ₃then

$\begin{matrix}{{{p(x)} - {p\left( {x,e} \right)}} = {{x^{2}y^{2}} - {\left( {x^{2} + e_{1}} \right)\left( {y^{2} + e_{2}} \right)} - e_{3}}} \\{= {{{- x^{2}}e_{2}} - {y^{2}e_{1}} - {e_{1}e_{2}} - e_{3}}} \\{= {{\left( {- x^{2}} \right)e_{1}^{0}e_{2}^{1}e_{3}^{0}} + {\left( {- y^{2}} \right)e_{1}^{1}e_{2}^{0}e_{3}^{0}} +}} \\{{\left( {- 1} \right)e_{1}^{1}e_{2}^{1}e_{3}^{0}} + {\left( {- 1} \right)e_{1}^{0}e_{2}^{0}e_{3}^{1}}}\end{matrix}$It can be seen that each term is of the form:c _(α) ₁ _(,α) ₂ _(,α) ₃ (x)e ₁ ^(α) ¹ e ₂ ^(α) ² e ₃ ^(α) ³which can be written as:c _(α)(x)e ^(α)so c_(α)(x) is a polynomial in x and a term in p(x)−p(x, e) and e^(α)=e₁^(α) ¹ e₂ ^(α) ² e₃ ^(α) ³ is the error bound associated with that termwhich may be formed from one or more of the absolute maximum errors (orerror bounds) for each node in the DFG (e₁, e₂, e₃ in this example) andα is a vector of integers.

So a sufficient, but potentially non necessary (i.e. potentially overlyrestrictive), condition for the scheme to meet the user defined absoluteerror requirement is:

${\sum\limits_{\alpha}{\lambda_{\alpha}e^{\alpha}}} \leq \eta$where λ_(α) is defined as:

$\lambda_{\alpha} = {\max\limits_{x}{{c_{\alpha}(x)}}}$

The following optimization problem may then be posed, which maximizesthe number of partial product bits removed with respect to the maximumabsolute errors introduced at each node:

$\max\;{\sum\limits_{i}{R_{i}(e)}}$${{subject}\mspace{14mu}{to}\mspace{14mu}{\sum\limits_{\alpha}{\lambda_{\alpha}e^{\alpha}}}} \leq \eta$

The maximum partial product removal will be achieved by fully utilizingthe error freedom, so the constraint can be set as equality:

$\max\;{\sum\limits_{i}{R_{i}(e)}}$${{subject}\mspace{14mu}{to}\mspace{14mu}{\sum\limits_{\alpha}{\lambda_{\alpha}e^{\alpha}}}} = \eta$

This can be solved by creating the Lagrangian using a Lagrangemultiplier T:

$L = {{\sum\limits_{i}{R_{i}(e)}} + {T\left( {{\sum\limits_{\alpha}{\lambda_{\alpha}e^{\alpha}}} - \eta} \right)}}$

Turning points of L are potential optimal solutions to the optimizationproblem, taking partial derivatives with respect to e_(j) (i.e. theerror limit for a particular node j) gives rise to the followingequation:

${{\sum\limits_{i}\frac{\partial R_{i}}{\partial e_{j}}} + {T{\sum\limits_{\alpha}{\lambda_{\alpha}\frac{\partial e^{\alpha}}{\partial e_{j}}}}}} = 0$

So an extremum of the original optimization occurs when:

$\begin{matrix}{{\sum\limits_{i}\frac{\partial R_{i}}{\partial e_{j}}} \propto {\sum\limits_{\alpha}{\lambda_{\alpha}\frac{\partial e^{\alpha}}{\partial e_{j}}}}} & (3)\end{matrix}$

The term

$\frac{\partial R_{i}}{\partial e_{j}}$in equation (3) above depends on the current state of all of the valuesof e, i.e. the current allocated allowed errors at each of the nodes(e.g. as set initially in block 304 and updated in block 310). Thesevalues can provide guidance in which direction to move the values of e(e.g. in block 310 of FIG. 3), but as shown in FIG. 3, a starting statemust first be set (e.g. in block 304).

A starting state may be computed (in block 304) in any way. In anexample, a starting state may be computed by assuming that, in theabsence of any knowledge of R_(i) (i.e. the number of partial productbits removed at another node i), e_(j) (the allocated error limit atnode j) only affects R_(j) (i.e. the number of partial product bitsremoved at that node j) and that R_(j) is proportional to log e_(j) (loge_(j) will be proportional to the number of partial product columns thatcould be truncated when exploiting the error e_(j)). Under thisassumption, the following holds:

$\frac{\partial R_{i}}{\partial e_{j}} = {{0\mspace{14mu}{if}\mspace{14mu} i} \neq j}$$\frac{\partial R_{j}}{\partial e_{j}} \propto \frac{1}{e_{j}}$

So a potential starting point for use in the iterative method of FIG. 3(i.e. for use in block 304) would be using values of e_(i) which satisfythe following equations for some constant T:

$\begin{matrix}{{\sum\limits_{\alpha}{\lambda_{\alpha}e_{j}\frac{\partial e^{\alpha}}{\partial e_{j}}}} = {T\mspace{14mu}{for}\mspace{14mu}{all}\mspace{14mu} j}} & \left( {4a} \right) \\{{\sum\limits_{\alpha}{\lambda_{\alpha}e^{\alpha}}} = \eta} & \left( {4b} \right)\end{matrix}$Which can alternatively be written as, where (λ, e)=Σ_(α)λ_(α)e^(α):

$\begin{matrix}{{e_{j}\frac{\partial q}{\partial e_{j}}} = {T\mspace{14mu}{for}\mspace{14mu}{all}\mspace{14mu} j}} & \left( {4a^{\prime}} \right) \\{{q\left( {\lambda,e} \right)} = \eta} & \left( {4b^{\prime}} \right)\end{matrix}$

Having established starting values for the errors allocated to each nodein the DFG using equations (4a) and (4b) or (4a′) and (4b′) above or analternative technique, each node i is optimized to exploit the allocatederror e_(i) (block 306). There are a number of different ways in whichthis optimization may be performed. In many examples, the implementationof a node uses a single summation array (e.g. for the SOP or constantdivision operation) where all partial product bits are summed inparallel. An example of such a single summation array 500 is shown inFIG. 5 for an example SOP:x ₁ y ₁ +x ₂ y ₂ +x ₃ y ₃As shown in FIG. 5, the summation array 500 comprises a first group ofpartial product bits 502 which correspond to the term x₁y₁, a secondgroup of partial product bits 504 which correspond to the term x₂y₂ anda third group of partial product bits 506 which correspond to the termx₃y₃.

Standard reduction techniques can be applied to the single summationarray 500 and internal products (e.g. x₁y₁ or x₂y₂ or x₃y₃) will notexist during the reduction. By using a single summation array, the areawhich is used when implementing the node is reduced significantly (e.g.by 10-20% for a given SOP implementation). A single summation array 500may alternatively be referred to as a binary array.

In various examples, the optimization (in block 306) uses a truncationscheme which exhibits faithful rounding and which is applied to thesingle summation array. These truncation schemes use the principle thata faithfully rounded result can be achieved more efficiently (i.e. inless area of hardware logic) by removing some of the partial productbits in an array and performing some form of compensation for thoseremoved bits, such as by adding a constant. This is shown graphically inFIG. 6 which shows a binary array 600 in which some of the leastsignificant columns 602 are discarded and a constant 604 is added tocompensate for the loss. Four particular truncation schemes are alsoshown graphically in FIG. 6: constant correction truncation (CCT) 610,variable correction truncation (VCT) 620, least mean square (LMS) 630and a faithful rounding machine (FRator) 640. When using any of theseschemes, the parameters used when removing bits and adding a constant(denoted C) must be selected to ensure faithful rounding. The firstthree of these schemes (CCT, VCT and LMS) work only when the operationis multiplication. The FRator truncation scheme works in any situationwhere there is a binary array.

In various examples, the FRator truncation scheme 640 (which isdescribed in US Patent Application publication number US2013/0346927) isused. FRator, as described in the US Patent Application is a faithfulrounding technique which removes the maximum number of bits such thatthe result is faithfully rounded. However, when used to optimize a nodein a DFG (in block 306 of FIG. 3), it is modified so that it removes themaximum number of bits without exceeding the allocated error for thatnode (and hence generates an updated single summation array). As inother truncation schemes, a constant is added to compensate for the bitsthat have been removed. In the graphical representation 640 in FIG. 6,the bits which are removed are shown as empty circles and denoted Δ andthe bits which are added as compensation (and form constant C) are shownas circles with striped shading. This truncation scheme can be describedin more detail with reference to FIGS. 7 and 8. FIG. 7 is a flow diagramof the technique and FIG. 8 shows the technique graphically.

As shown in FIG. 7, the method comprises calculating the maximum numberof entire columns that can be removed from the summation array (for theparticular node i in the DFG) whilst not exceeding the error e_(i)allocated to that node (block 704) and then removing all the bits inthose identified columns (block 706). The maximum number of entirecolumns that can be removed can be calculated (in block 704) using aparameter k which is the largest number of complete (i.e. whole) leastsignificant columns which can be removed from the summation array suchthat the allocated error is not exceeded. This may be written as:

$\begin{matrix}{k = {{{\max(r)}\mspace{14mu}{such}\mspace{14mu}{that}\mspace{14mu}{\sum\limits_{s = 0}^{r - 1}{h_{s}2^{s}}}} < e_{i}}} & (5)\end{matrix}$where s is the column index (in the summation array) and h_(s) is thetotal number of bits in column s (i.e. the array height for column s).

Referring to the example shown in FIG. 8 and for an allocated errore_(i)=32, h₀=4, h₁=7, h₂=10. So if r=2, Σ_(s=0)^(r−1)h_(s)2^(s)=(4×1)+(7×2)=18 and if r=3, Σ_(s=0)^(r−1)h_(s)2^(s)=(4×1)+(7×2)+(10×4)=58 (which is greater than 32).Consequently, using equation (5), k=2 and two full columns of bits canbe removed (in block 706) as shown in the diagram 804.

The number of bits, l_(k), to remove from the next least significantcolumn (i.e. the column with index k) is then calculated (in block 708)using:

$\begin{matrix}{l_{k} = \left\lceil {\left( {2^{- k}e_{i}} \right) - 1 - {\sum\limits_{s = 0}^{k - 1}\;{h_{s}2^{s - k}}}} \right\rceil} & (6)\end{matrix}$Referring to the example shown in FIG. 8 and for an allocated errore_(i)=32, k=2 from above and sol_(k)=[(2⁻²×32)−1−(4×2⁻²+7×2⁻¹)]=[8−1−1−3.5]=[2.5]3. The calculatednumber of bits is then removed from column k (block 710) and this isshown in the diagram 804 in FIG. 8.

To compensate for the bits that have been removed (in blocks 706 and710) a constant C is then added to the summation array (block 712)where:C=e _(i)−1

Having optimized each node (in block 306), e.g. using FRator asdescribed above, equation (3) can be used iteratively (in block 310) toimprove those starting errors by solving equations (4a) and (4b) abovefor e_(i) for some constant T. The values of

$\frac{\partial R_{i}}{\partial e_{j}}$for all i and j are computed, where this is the rate of change of thenumber of partial product bits removed with respect to the error (forthe newly optimized summation array generated in block 306), and whilethe following are not equal for all j:

$\begin{matrix}\left\{ {\frac{1}{\sum\limits_{i}\;\frac{\partial R_{i}}{\partial e_{j}}}{\sum\limits_{\alpha}\;{\lambda_{\alpha}\frac{\partial{\mathbb{e}}^{\alpha}}{\partial e_{j}}}}} \right\} & \left( {6a} \right)\end{matrix}$Replace e by f (where f is a vector of updated error allocations foreach node) where f satisfies, for some constant T:

$\begin{matrix}{{f_{j}{\sum\limits_{\alpha}\;{\lambda_{\alpha}\frac{\partial f^{\alpha}}{\partial f_{j}}}}} = {{Te}_{j}{\sum\limits_{i}\;{\frac{\partial R_{i}}{\partial e_{j}}\mspace{14mu}{for}\mspace{14mu}{all}\mspace{14mu} j}}}} & \left( {7a} \right) \\{{\sum\limits_{\alpha}\;{\lambda_{\alpha}f^{\alpha}}} = \eta} & \left( {7b} \right)\end{matrix}$which may alternatively be written as:

$\begin{matrix}{{e_{j}{\sum\limits_{i}\;\frac{\partial R_{i}}{\partial e_{j}}}} \propto {f_{j}\;\frac{\partial q}{\partial f_{j}}}} & \left( {7a^{\prime}} \right) \\{{q\left( {\lambda,f} \right)} = \eta} & \left( {7b^{\prime}} \right)\end{matrix}$

The new allocated error values (now equal to f) are then used to furtheroptimize each node (in block 306), e.g. using FRator as described above.This iterative process moves the variables e_(i) towards a point atwhich equation (3) is satisfied.

The method shown in FIG. 3 and described above can be further explainedwith reference to three examples. The first is a cubic polynomial, thesecond is a bilinear interpolation (which requires constant division)and the third is an instance of a Gabor filter (which unlike the firsttwo examples, is non linear in the ε_(i) variables).

In the first example, the polynomial is:p=at ³+2^(n) bt ²+2^(2n) ct+2^(3n) dwhere t is an n-bit unsigned integer which may be represented as the DFG901 shown in FIG. 9. If the whole polynomial must be computed to give afaithfully rounded result, the globally acceptable error is:η=2^(3n)−1Errors can be introduced into each of the three operators 902-904 shownin the DFG (with the maximum absolute allowable error for node i denotede_(i)) and if the actual error introduced at each node is denoted ε₁,ε₂, ε₃, the value returned for the polynomial with these errors{circumflex over (p)} is:{circumflex over (p)}=a((t ²+ε₁)t+ε ₂)+2^(n) b(t ²+ε₁)+2^(2n) ct+2^(3n)d+ε ₃The globally acceptable error requires that:|p−{circumflex over (p)}|≤ηand therefore:|(at+2^(n) b)ϵ₁ +aϵ ₂+ε₃|≤2^(3n)−1which is linear in variables ε_(i).

As a result of the linearity, the initial errors may be set (in block304) to:e _(i)=η/(λ_(i) n)where n is the number of operators (n=3 in this example) and the λ_(i)terms are the maximum absolute values of the coefficient at eachoperator, so in this example:λ₁=max|at+2^(n) b|=(2^(n)−1)(2^(n+1)−1)λ₂=max|a|=(2^(n)−1)λ₃=1

These errors are then used to optimize each node (in block 306), e.g.using FRator as described above. This involves array truncations, aconstant addition and removal of the least significant k columns plusl_(k) bits from column k. The updated error allocations are thencalculated (in block 310) by computing

$\frac{\partial R_{i}}{\partial e_{j}}$for all i and j and while the following are not equal for all j:

$\left\{ \frac{\lambda_{j}}{\sum\limits_{i}\;\frac{\partial R_{i}}{\partial e_{j}}} \right\}$replacing e by f where f satisfies:

$\begin{matrix}{f_{j} = \frac{\eta\; e_{j}{\sum\limits_{i}\;\frac{\partial R_{i}}{\partial e_{j}}}}{\lambda_{j}{\sum\limits_{i,j}\;{e_{j}\frac{\partial R_{i}}{\partial e_{j}}}}}} & (8)\end{matrix}$

The values

$\frac{\partial R_{i}}{\partial e_{j}}$can be computed by considering the extra amount of error required atnode j to remove another partial product bit from the single array foranother node i. If, as described above, the truncation removes k columnsfrom the array, then removing another bit would require increasing thetotal value of all the removed bits by 2^(k) and for the bound still tobe satisfied, e must be increased by 2^(k-1). So a value for

$\frac{\partial R_{i}}{\partial e_{j}}$is:

$\begin{matrix}{\frac{\partial R_{i}}{\partial e_{j}} = \left\{ \begin{matrix}{\frac{1}{2^{k_{i} - 1}},} & {j = i} \\{0,} & {otherwise}\end{matrix} \right.} & (9)\end{matrix}$where k_(i) is the number of truncated columns in the ith operator.

Based on the algorithm held within equation (8) above in light of this,the method shown in FIG. 3 can be simplified as follows whenever{circumflex over (p)} is linear in the variables ε_(i) and the DFG onlyconsists of SOPs. The initially allocated errors are set (in block 304)to:e _(i)=η/(λ_(i) n)where n is he number of operators. The optimization within each node (inblock 306) comprises computing the number of truncated columns k_(i)(which, as described above, results in exploiting error e_(i) in node loperator i) and then, while the following are not equal for all i:{λ_(i)2^(k) ^(i) }the error allocations are updated (in block 310) by replacing e_(j) bythe following:

$\frac{\eta\; e_{j}2^{- k_{j}}}{\lambda_{j}{\sum\limits_{i}\;{e_{i}2^{- k_{i}}}}}$

In the second example, the polynomial is:

$p = {{{\left( {1 - \frac{t}{2^{n} - 1}} \right)\left( {1 - \frac{s}{2^{n} - 1}} \right)a} + {\left( {1 - \frac{t}{2^{n} - 1}} \right)\frac{s}{2^{n} - 1}b} + {\frac{t}{2^{n} - 1}\left( {1 - \frac{s}{2^{n} - 1}} \right)c} + {\frac{t}{2^{n} - 1}\frac{s}{2^{n} - 1}d}} = \frac{{\overset{\_}{t}\;\overset{\_}{s}\; a} + {\overset{\_}{t}\; s\; b} + {t\;\overset{\_}{s}\; c} + {t\; s\; d}}{\left( {2^{n} - 1} \right)^{2}}}$where t is the bit wise inversion of t and the absolute error bound,η=1. An example architecture that may be used is interpolation withrespect to s followed by t:

$\begin{matrix}{s_{a,b} = \frac{{\overset{\_}{s}\; a} + {s\; b}}{2^{n} - 1}} \\{s_{c,d} = \frac{{\overset{\_}{s}\; c} + {s\; d}}{2^{n} - 1}} \\{p = \frac{{\overset{\_}{t}\; s_{a,b}} + {t\; s_{a,b}}}{2^{n} - 1}}\end{matrix}$The corresponding DFG is shown in the second example 902 in FIG. 9 whichcomprises 6 operators and hence has 6 potential locations for errors tobe introduced. As above the maximum absolute allowable error at a node loperator is denoted e_(i) and if the actual error at each node isdenoted ε_(i).

The implementation therefore returns the following value:

$\begin{matrix}{{\hat{s}}_{a,b} = {\frac{{\overset{\_}{s}\; a} + {s\; b} + ɛ_{1}}{2^{n} - 1} + ɛ_{3}}} \\{{\hat{s}}_{c,d} = {\frac{{\overset{\_}{s}\; c} + {s\; d} + ɛ_{2}}{2^{n} - 1} + ɛ_{4}}} \\{\hat{p} = {\frac{{\overset{\_}{t}\;{\hat{s}}_{a,b}} + {t\;{\hat{s}}_{a,b}} + ɛ_{5}}{2^{n} - 1} + ɛ_{6}}}\end{matrix}$This design also has {circumflex over (p)} which is linear in thevariables ε_(i) which means that the linear version of the algorithmheld within equation (8) above is still appropriate, but unlike thefirst example, this example exploits the error freedom in constantdivision.

The initially allocated errors are set (in block 304) to:e _(i)=η/(λ_(i) n)where n is he number of operators (6 in this example) and:

$\lambda_{1} = {{\max{\frac{\partial\hat{p}}{\partial ɛ_{1\;}}}} = {{\max{\frac{\overset{\_}{t}}{\left( {2^{n} - 1} \right)^{2}}}} = \frac{1}{2^{n - 1}}}}$$\lambda_{2} = {{\max{\frac{\partial\hat{p}}{\partial ɛ_{2}}}} = {{\max{\frac{t}{\left( {2^{n} - 1} \right)^{2}}}} = \frac{1}{2^{n} - 1}}}$$\lambda_{3} = {{\max{\frac{\partial\hat{p}}{\partial ɛ_{3}}}} = {{\max{\frac{\overset{\_}{t}}{2^{n} - 1}}} = 1}}$$\lambda_{4} = {{\max{\frac{\partial\hat{p}}{\partial ɛ_{4}}}} = {{\max{\frac{\overset{\_}{t}}{2^{n} - 1}}} = 1}}$$\lambda_{5} = {{\max{\frac{\partial\hat{p}}{\partial ɛ_{5}}}} = {{\max{\frac{1}{2^{n} - 1}}} = 1}}$$\lambda_{6} = {{\max{\frac{\partial\hat{p}}{\partial ɛ_{6}}}} = 1}$So:

$e_{1} = {e_{2} = {e_{5} = \frac{2^{n} - 1}{6}}}$$e_{3} = {e_{4} = {e_{6} = \frac{1}{6}}}$

As set out above, optimizing an SOP node comprises calculating the valueof k and then this is used when updating error allocations to calculatea value for

$\frac{\partial R_{i}}{\partial e_{j}}$using in equation (9). This, however, cannot be used for constantdivision nodes.

It has been shown that the constant division of unsigned m-bit number xby d can be implemented as (ax+b)>>k, where ‘>>k’ denotes a right binaryshift by variable k. For a faithfully rounded scheme, the optimal valueof k is defined as follows:

$\begin{matrix}{k_{opt} = {\min\left( {k^{+},k^{-}} \right)}} & \left( {10a} \right) \\{k^{\pm} = {\min\left( {{k\text{:}\frac{2^{k}}{\left( {\mp 2^{k}} \right){mod}\; d}} \geq \left\lfloor \frac{2^{m}}{d} \right\rfloor} \right)}} & \left( {10b} \right)\end{matrix}$Allowable error introduced into the constant division is via dividing xby 2^(r) for some r<0. The value of k must then satisfy:

$k^{\pm} = {\min\left( {{k\text{:}\frac{2^{k}}{\left( {\mp 2^{k}} \right){mod}\; d}} > \left\lfloor \frac{2^{m - r}}{d} \right\rfloor} \right)}$If the error is doubled then r increases by one and k_(opt) will changeif one of the following ∓ conditions holds:

$\frac{2^{k_{opt} - 1}}{\left( {\mp 2^{k_{opt} - 1}} \right){mod}\; d} > \left\lfloor \frac{2^{m - r - 1}}{d} \right\rfloor$If this condition holds, k would be decreased which will remove onepartial product row from the product ax, removing m bits. Whether or notthis condition is satisfied, the output of the constant divisor has beenreduced by 1 bit. If the result of the constant divisor is subsequentlyused in another operator, further partial product bits will be removed.

If n_(i,j) denotes the bit width of the variable that the output ofoperator i is multiplied by in operator j, and if operator i is aconstant division operator with input bit width n_(q,i) for some q, theconstant for division is d_(i), k_(opt) for the operator is k_(i), then:

$\begin{matrix}{\frac{\partial R_{i}}{\partial e_{j}} = \left\{ {{\begin{matrix}\frac{n_{q,i}}{2e_{i}} & {j = {i\mspace{14mu}{and}\mspace{14mu}{cond}\mspace{14mu}{holds}}} \\0 & {j = {i\mspace{14mu}{and}\mspace{14mu}{cond}\mspace{14mu}{does}\mspace{14mu}{not}\mspace{14mu}{hold}}} \\\frac{n_{i,j}}{2e_{i}} & {otherwise}\end{matrix}{cond}} = \left( {\frac{2^{k_{i} - 1}}{\left( {\mp 2^{k_{i} - 1}} \right){mod}\; d_{i}} > \left\lfloor \frac{2^{n_{q,i} - r_{i} - 1}}{d_{i}} \right\rfloor} \right)} \right.} & (11)\end{matrix}$where e_(i)=u_(i)2^(r) ^(i) and u_(i)ε[1,2).

Based on the algorithm held within equation (8) above in light of this,the method shown in FIG. 3 can be simplified as follows whenever{circumflex over (p)} is linear in the variables ε_(i) and the DFGconsists of SOPs and constant divisions. The initially allocated errorsare set (in block 304) to:e _(i)=η/(λ_(i) n)where n is he number of operators. The optimization within each node (inblock 306) comprises, for SOPs, computing the number of truncatedcolumns k_(i) (which, as described above, results in exploiting errore_(i) in node i) and for constant divisors, computing the constant shiftvalue k_(i) (which, as described above, results in exploiting errore_(i) in node i). Then, while the following are not equal for all i:

$\left\{ \frac{\lambda_{j}}{\sum\limits_{i}\frac{\partial R_{i}}{\partial e_{j}}} \right\}$the error allocations are updated (in block 310) by replacing e_(j) bythe following:

$\frac{\eta\; e_{j}{\sum\limits_{i}\frac{\partial R_{i}}{\partial e_{j}}}}{\lambda_{j}{\sum\limits_{i,j}{e_{j}\frac{\partial R_{i}}{\partial e_{j}}}}}$where the values for

$\frac{\partial R_{i}}{\partial e_{j}}$are calculated using equation (9) for SOP or nodes and equation (11) forconstant division nodes.

In the third example, {circumflex over (p)} which is non linear in thevariables ε_(i), for example when implementing an instance of a Gaborfilter. The DFG for the architecture is shown in the third example 903in FIG. 9 and comprises 5 operators (and hence has 5 potential locationsfor errors to be introduced). In this example, the polynomial equationscan be solved via Grobner bases techniques.

FIG. 10 shows a more detailed implementation of the method shown in FIG.3 for a polynomial p. The method comprises receiving (by the lossysynthesizer) a DFG representing the polynomial p (block 302). The DFGwhich is received (in block 302) has an architecture consisting only ofoperations which are SOPs or constant divisions. A version of thepolynomial (denoted {circumflex over (p)}) is then constructed which hasan additive error ε_(i) for each operator (block 1003). The differencebetween the original polynomial p and the polynomial with error{circumflex over (p)} is then expanded with respect to the variablesε_(i) (block 1004) to produce an expansion of the form:

${p - \hat{p}} = {\sum\limits_{\alpha}{{c_{\alpha}(x)}ɛ^{\alpha}}}$

The variables λ_(α) are then calculated (block 1005), where these aredefined as follows:

$\lambda_{\alpha} = {\max\limits_{x}{{c_{\alpha}(x)}}}$These may, for example, be calculated by iterative use of SatisfiabilityModulo Theory (SMT).

An initial error for each operator (i.e. each node in the DFG) iscalculated (block 304). These initial errors, e_(i) the starting maximumerror bound for each operator, can be provided by the user or they maybe determined by solving the following equations via Grobner bases whichincludes the maximum absolute error tolerance defined by the user η:

$\begin{matrix}{{\sum\limits_{\alpha}{\lambda_{\alpha}e_{j}\frac{\partial e^{\alpha}}{\partial e_{j}}}} = {T\mspace{14mu}{for}\mspace{14mu}{all}\mspace{14mu} j}} & \left( {4a} \right) \\{{\sum\limits_{\alpha}{\lambda_{\alpha}e^{\alpha}}} = \eta} & \left( {4b} \right)\end{matrix}$

Each operator is then optimized to exploit the allocated error e_(i)(which is referred to as the maximum absolute allowable error for theoperator) and to calculate a value k_(i) (block 1006). How this is doneis different depending upon whether the operator is a SOP or constantdivision operator. For an SOP operator, a single partial product arrayis created and using the FRator technique described above (i.e. equation(5)) the value of k_(i) is calculated:

$\begin{matrix}{k = {{{\max(r)}\mspace{14mu}{such}\mspace{14mu}{that}\mspace{14mu}{\sum\limits_{s = 0}^{r - 1}{h_{s}2^{s}}}} < e_{i}}} & (5)\end{matrix}$For a constant division operator which requires the division by d, thevalue of k_(i) is calculated using equations (10a) and (10b) from above.

$\begin{matrix}{k_{opt} = {\min\left( {k^{+},k^{-}} \right)}} & \left( {10a} \right) \\{k^{\pm} = {\min\left( {{k\text{:}\frac{2^{k}}{\left( {\mp 2^{k}} \right){mod}\; d}} > \left\lfloor \frac{2^{m}}{d} \right\rfloor} \right)}} & \left( {10b} \right)\end{matrix}$

The rate of change in the number of partial product bits with changingerror,

$\frac{\partial R_{j}}{\partial e_{i}},$is then calculated for each operator j. For an SOP operator this usesequation (9) above:

$\begin{matrix}{\frac{\partial R_{j}}{\partial e_{i}} = \left\{ \begin{matrix}{\frac{1}{2^{k_{j\; - 1}}},} & {j = i} \\{0,} & {otherwise}\end{matrix} \right.} & (9)\end{matrix}$and for a constant division operator, equation (11) is used:

$\begin{matrix}{\frac{\partial R_{j}}{\partial e_{i}} = \left\{ {{\begin{matrix}\frac{n_{q,i}}{2e_{i}} & {j = {i\mspace{14mu}{and}\mspace{14mu}{cond}\mspace{14mu}{holds}}} \\0 & {j = {i\mspace{14mu}{and}\mspace{14mu}{cond}\mspace{14mu}{does}\mspace{14mu}{not}\mspace{14mu}{hold}}} \\\frac{n_{i,j}}{2e_{i}} & {otherwise}\end{matrix}{cond}} = \left( {\frac{2^{k_{i} - 1}}{\left( {\mp 2^{k_{i} - 1}} \right){mod}\; d_{i}} > \left\lfloor \frac{2^{n_{q_{,i} - r_{i} - 1}}}{d_{i}} \right\rfloor} \right)} \right.} & (11)\end{matrix}$where n_(q,i) is the input bit width into the operator ande_(i)=u_(i)2^(r) ^(i) and u_(i)ε[1,2). The input bit widths arerecalculated at each iteration for all nodes in the DFG (e.g. as part ofthe optimization process of block 1006) as they affect both the constantdivision operators (as in equation (11)) and SOP operators (where thebit widths affect the number of bits per column in the array).

The method is iterated for a maximum number of iterations (until block308 returns ‘Yes’) or maximum runtime or until the following are equalfor all j (‘Yes’ in block 1008):

$\begin{matrix}\left\{ {\frac{1}{\sum_{i}\frac{\partial R_{i}}{\partial e_{j}}}{\sum\limits_{\alpha}{\lambda_{\alpha}\frac{\partial e^{\alpha}}{\partial e_{j}}}}} \right\} & \left( {6a} \right)\end{matrix}$The condition expressed in equation (6a) above (and block 1008 in FIG.10) is the same as the condition described above with reference to block312 in FIG. 3. In the event that the method terminates at a maximumnumber of iterations (‘Yes’ in block 308) or maximum runtime, theiteration that has the small range of values according to equation (6a)may be returned.

If the method does not terminate at block 1008 (i.e. the values ofequation (6a) are not equal for all j), the allocated errors for eachoperator are updated by replacing the current error allocations, e, withupdated error allocations f, by solving the following equations usingGrobner bases (block 1009):

$\begin{matrix}{{f_{j}{\sum\limits_{\alpha}{\lambda_{\alpha}\frac{\partial f^{\alpha}}{\partial f_{j}}}}} = {{Te}_{j}{\sum\limits_{i}{\frac{\partial R_{i}}{\partial e_{j}}\mspace{14mu}{for}\mspace{14mu}{all}\mspace{14mu} j}}}} & \left( {7a} \right) \\{{\sum\limits_{\alpha}{\lambda_{\alpha}f^{\alpha}}} = \eta} & \left( {7b} \right)\end{matrix}$which may alternatively be written as:

$\begin{matrix}{{e_{j}{\sum\limits_{i}\frac{\partial R_{i}}{\partial e_{j}}}} \propto {f_{j}\frac{\partial q}{\partial f_{j}}}} & \left( {7a^{\prime}} \right) \\{{q\left( {\lambda,f} \right)} = \eta} & \left( {7b^{\prime}} \right)\end{matrix}$This procedure requires array creation for the SOP operators and may beimplemented where the operators can be implemented via the summation ofany binary array.

By using the methods described above (e.g. as shown in FIGS. 3 and 10),a hardware logic design (e.g. in RTL) can be generated which occupies asmaller area when fabricated (e.g. a smaller area of silicon) but stillguarantees to meet a defined error requirement (which may be faithfulrounding of the result of the entire polynomial). This aidsminiaturization of components and the devices (e.g. smartphones, tabletcomputers and other computing devices) in which the components are used.In addition, or instead, it enables more functionality to be implementedwithin a similar area of silicon chip. By reducing the physical size(i.e. area) that is used, more ICs can be fabricated from a singlesilicon wafer, which reduces the overall cost per die.

In the methods described above (and shown in FIGS. 3 and 10), thepolynomial is represented as a DFG comprising only SOPs and constantdivisions and input to the lossy synthesizer (in block 302) and initialerrors are allocated to each node (i.e. operator) in the DFG (in block304). It will be appreciated that there may be many differentarchitectures which could be used for a particular polynomial and manydifferent ways that the initial errors could be allocated to nodes (whenimplementing block 304). These different architectures and/or errorallocations may lead to different RTL implementations for a polynomialbeing generated.

In various examples, more than one RTL implementation for a polynomialmay be generated (e.g. a first RTL implementation using the initiallyallocated errors and a second RTL implementation using the final updatedallocated errors) and the resulting RTL implementations may be comparedusing a heuristic hardware cost function given below which uses a unitarea cost model for AND and OR gates and a unit area cost of two for XORgates. An example of such a cost function for an arbitrary binary array,with width w, height h and a number pp of bits in the array, has theform:

${{area}\mspace{14mu}\left( {{array}\left( {h,w,{pp}} \right)} \right)} \approx {{7{{pp}\left( {1 - \left( \frac{2}{3} \right)^{f}} \right)}} + {\frac{w}{2}\left( {{2\left\lceil {\log_{2}w} \right\rceil} + 11} \right)}}$${{where}\mspace{14mu} f} = \left\lceil \frac{{\log_{2}h} - 1}{{\log_{2}3} - 1} \right\rceil$

By using this cost function, several candidate RTL implementations maybe compared and one may be selected to be input to a logic synthesizerto transition the RTL implementation into a gate level description whichcan then be laid out and implemented in silicon. This is a moreefficient process than having to perform the logic synthesis and layoutof each of the candidate RTL implementations to determine which is mostefficient in terms of area requirements (i.e. which implementationrequires the least area of hardware logic).

Although the cost function above is described with reference to lossysynthesis, the cost function allows for the costing of any arraysummation and can be applied in other applications beyond lossysynthesis.

FIG. 11 illustrates various components of an exemplary computing-baseddevice 1100 which may be implemented as any form of a computing and/orelectronic device, and in which embodiments of the methods describedherein may be implemented.

Computing-based device 1100 comprises one or more processors 1102 whichmay be microprocessors, controllers or any other suitable type ofprocessors for processing computer executable instructions to controlthe operation of the device in order to perform the methods describedherein (e.g. the method of FIG. 3 or FIG. 10). In some examples, forexample where a system on a chip architecture is used, the processors1102 may include one or more fixed function blocks (also referred to asaccelerators) which implement a part of the method of generating RTL fora polynomial in hardware (rather than software or firmware). Platformsoftware comprising an operating system 1104 or any other suitableplatform software may be provided at the computing-based device toenable application software, such as a lossy synthesizer module 1106(which performs the method of FIG. 3 or FIG. 10) and/or a designassessment module 1107 (which implements the cost function above) to beexecuted on the device.

The computer executable instructions may be provided using anycomputer-readable media that is accessible by computing based device1100. Computer-readable media may include, for example, computer storagemedia such as memory 1108 and communications media. Computer storagemedia (i.e. non-transitory machine readable media), such as memory 1108,includes volatile and non-volatile, removable and non-removable mediaimplemented in any method or technology for storage of information suchas computer readable instructions, data structures, program modules orother data. Computer storage media includes, but is not limited to, RAM,ROM, EPROM, EEPROM, flash memory or other memory technology, CD-ROM,digital versatile disks (DVD) or other optical storage, magneticcassettes, magnetic tape, magnetic disk storage or other magneticstorage devices, or any other non-transmission medium that can be usedto store information for access by a computing device. In contrast,communication media may embody computer readable instructions, datastructures, program modules, or other data in a modulated data signal,such as a carrier wave, or other transport mechanism. As defined herein,computer storage media does not include communication media. Althoughthe computer storage media (i.e. non-transitory machine readable media,e.g. memory 1108) is shown within the computing-based device 1100 itwill be appreciated that the storage may be distributed or locatedremotely and accessed via a network or other communication link (e.g.using communication interface 1110).

The computing-based device 1100 also comprises an input/outputcontroller 1111 arranged to output display information to a displaydevice 1112 which may be separate from or integral to thecomputing-based device 1100. The display information may provide agraphical user interface. The input/output controller 1111 is alsoarranged to receive and process input from one or more devices, such asa user input device 1114 (e.g. a mouse or a keyboard). This user inputmay be used to specify maximum error bounds (e.g. for use in the methodof FIG. 3 or 10) or to select one of a plurality of candidate hardwarelogic designs based on the output of the design assessment module 1107.In an embodiment the display device 1112 may also act as the user inputdevice 1114 if it is a touch sensitive display device. The input/outputcontroller 1111 may also output data to devices other than the displaydevice, e.g. a locally connected printing device (not shown in FIG. 11).

The term ‘processor’ and ‘computer’ are used herein to refer to anydevice, or portion thereof, with processing capability such that it canexecute instructions. The term ‘processor’ may, for example, includecentral processing units (CPUs), graphics processing units (GPUs orVPUs), physics processing units (PPUs), radio processing units (RPUs),digital signal processors (DSPs), general purpose processors (e.g. ageneral purpose GPU), microprocessors, any processing unit which isdesigned to accelerate tasks outside of a CPU, etc. Those skilled in theart will realize that such processing capabilities are incorporated intomany different devices and therefore the term ‘computer’ includes settop boxes, media players, digital radios, PCs, servers, mobiletelephones, personal digital assistants and many other devices.

Those skilled in the art will realize that storage devices utilized tostore program instructions can be distributed across a network. Forexample, a remote computer may store an example of the process describedas software. A local or terminal computer may access the remote computerand download a part or all of the software to run the program.Alternatively, the local computer may download pieces of the software asneeded, or execute some software instructions at the local terminal andsome at the remote computer (or computer network). Those skilled in theart will also realize that by utilizing conventional techniques known tothose skilled in the art that all, or a portion of the softwareinstructions may be carried out by a dedicated circuit, such as a DSP,programmable logic array, or the like.

Memories storing machine executable data for use in implementingdisclosed aspects can be non-transitory media. Non-transitory media canbe volatile or non-volatile. Examples of volatile non-transitory mediainclude semiconductor-based memory, such as SRAM or DRAM. Examples oftechnologies that can be used to implement non-volatile memory includeoptical and magnetic memory technologies, flash memory, phase changememory, resistive RAM.

A particular reference to “logic” refers to structure that performs afunction or functions. An example of logic includes circuitry that isarranged to perform those function(s). For example, such circuitry mayinclude transistors and/or other hardware elements available in amanufacturing process. Such transistors and/or other elements may beused to form circuitry or structures that implement and/or containmemory, such as registers, flip flops, or latches, logical operators,such as Boolean operations, mathematical operators, such as adders,multipliers, or shifters, and interconnect, by way of example. Suchelements may be provided as custom circuits or standard cell libraries,macros, or at other levels of abstraction. Such elements may beinterconnected in a specific arrangement. Logic may include circuitrythat is fixed function and circuitry can be programmed to perform afunction or functions; such programming may be provided from a firmwareor software update or control mechanism. Logic identified to perform onefunction may also include logic that implements a constituent functionor sub-process. In an example, hardware logic has circuitry thatimplements a fixed function operation, or operations, state machine orprocess.

Any range or device value given herein may be extended or alteredwithout losing the effect sought, as will be apparent to the skilledperson.

It will be understood that the benefits and advantages described abovemay relate to one embodiment or may relate to several embodiments. Theembodiments are not limited to those that solve any or all of the statedproblems or those that have any or all of the stated benefits andadvantages.

Any reference to ‘an’ item refers to one or more of those items. Theterm ‘comprising’ is used herein to mean including the method blocks orelements identified, but that such blocks or elements do not comprise anexclusive list and an apparatus may contain additional blocks orelements and a method may contain additional operations or elements.Furthermore, the blocks, elements and operations are themselves notimpliedly closed.

The steps of the methods described herein may be carried out in anysuitable order, or simultaneously where appropriate. The arrows betweenboxes in the figures show one example sequence of method steps but arenot intended to exclude other sequences or the performance of multiplesteps in parallel. Additionally, individual blocks may be deleted fromany of the methods without departing from the spirit and scope of thesubject matter described herein. Aspects of any of the examplesdescribed above may be combined with aspects of any of the otherexamples described to form further examples without losing the effectsought. Where elements of the figures are shown connected by arrows, itwill be appreciated that these arrows show just one example flow ofcommunications (including data and control messages) between elements.The flow between elements may be in either direction or in bothdirections.

It will be understood that the above description of a preferredembodiment is given by way of example only and that variousmodifications may be made by those skilled in the art. Although variousembodiments have been described above with a certain degree ofparticularity, or with reference to one or more individual embodiments,those skilled in the art could make numerous alterations to thedisclosed embodiments without departing from the spirit or scope of thisinvention.

The invention claimed is:
 1. A method of generating an implementation of a fixed-point polynomial with rational coefficients in hardware logic that satisfies a defined error bound, the method comprising: receiving, in a synthesizer module, the polynomial as a data-flow graph comprising one or more nodes, wherein each node is either a sum-of-products operator or a constant division operator; computing, in the synthesizer module, an error bound for each node in the data-flow graph based on the defined error bound for the polynomial; optimizing, in the synthesizer module, a single summation array for each node using a truncation scheme and the error bound for each node; calculating, in the synthesizer module, an updated error bound for each node in the data-flow graph, wherein the calculating of updated error bounds redistributes error bounds between nodes to allow for synthesis of the fixed-point polynomial with a reduced physical size whilst satisfying the defined error bound; and further optimizing, in the synthesizer module, the single summation array for each node using the truncation scheme and the updated error bound for the node; and generating the implementation of the fixed-point polynomial in hardware logic using the optimized single summation arrays.
 2. The method according to claim 1, further comprising: inputting RTL generated by the method of claim 1 to a logic synthesizer to produce a gate level description of the implementation of the fixed-point polynomial.
 3. The method according to claim 1, further comprising: fabricating the implementation of the fixed-point polynomial in silicon.
 4. The method according to claim 1, further comprising repeating: calculating, in the synthesizer module, an updated error bound for each node in the data-flow graph; and further optimizing, in the synthesizer module, the single summation array for each node using the truncation scheme and the updated error bound for the node, until a stop condition is reached, the stop condition comprising one of: a maximum iteration count is reached; a maximum runtime is reached; and the updated error bound for each node calculated in an iteration is the same as the updated error bound for each node calculated in an immediately previous iteration, wherein the single summation array which is optimized in an iteration is the single summation array generated, in the synthesizer module, by the optimization in the immediately previous iteration.
 5. The method according to claim 1, wherein optimizing a single summation array for each node using a truncation scheme and the error bound for the node comprises: calculating, in the synthesizer module, a parameter k for each node, wherein: for a node which is a sum-of-products operator, the parameter k is a number of whole columns in the single summation array that can be removed whilst satisfying the error bound for the node; and for a node which is a constant division operator, the parameter k is a number of bits by which an output from the node is right shifted when the constant division operation is implemented as a sum-of-products operation followed by a right shift by k bits.
 6. The method according to claim 5, wherein optimizing a single summation array for each node using a truncation scheme and the error bound for the node further comprises: calculating, in the synthesizer module, an updated input bit width for each node.
 7. The method according to claim 5, wherein calculating an updated error bound for each node in the data-flow graph comprises: calculating, in the synthesizer module, for each node, a rate of change of a number of bits removed from the single summation array for node with respect to a change in error bound for the node.
 8. The method according to claim 7, wherein the fixed-point polynomial is linear with respect to actual errors introduced by each node and wherein the rate of change of a number of bits removed from the single summation array for node j with respect to a change in error bound for the node, $\frac{\partial R_{j}}{\partial e_{i\;}},$ comprises: for a node j which is a sum-of-products operator: $\frac{\partial R_{j}}{\partial e_{i\;}} = \left\{ \begin{matrix} {\frac{1}{2^{k_{j} - 1}},} & {j = i} \\ {0,} & {otherwise} \end{matrix} \right.$ where k_(j) is the parameter k for the node; and for a node j which is a constant division operator: $\frac{\partial R_{j}}{\partial e_{i}} = \left\{ {{\begin{matrix} \frac{n_{q,i}}{2e_{i}} & {j = {i\mspace{14mu}{and}\mspace{14mu}{cond}\mspace{14mu}{holds}}} \\ 0 & {j = {i\mspace{14mu}{and}\mspace{14mu}{cond}\mspace{14mu}{does}\mspace{14mu}{not}\mspace{14mu}{hold}}} \\ \frac{n_{i,j}}{2e_{i}} & {otherwise} \end{matrix}{cond}} = \left( {\frac{2^{k_{i} - 1}}{\left( {\mp 2^{k_{i} - 1}} \right){mod}\; d_{i}} > \left\lfloor \frac{2^{n_{q,i} - r_{i} - 1}}{d_{i}} \right\rfloor} \right)} \right.$ where n_(q,i) is an input bit width into the node, e_(i) is the error bound for node i and e_(i)=u_(i)2^(r) ^(i) and u_(i)ϵ[1,2).
 9. The method according to claim 5, wherein the fixed-point polynomial is linear with respect to actual errors introduced by each node and wherein computing an error bound for each node in the data-flow graph based on the defined error bound for the polynomial comprises: setting, in the synthesizer module, an error for a node i to a value e_(i) where e_(i)=η/(λ_(i)n), η is the defined error bound for the polynomial and λ_(i) is a maximum absolute value of a coefficient at node i.
 10. The method according to claim 9, wherein calculating an updated error bound for each node in the data-flow graph comprises updating, in the synthesizer module, a value e_(i) at node i to a value given by: $\frac{\eta\; e_{j}{\sum\limits_{i}\frac{\partial R_{i}}{\partial e_{j}}}}{\lambda_{j}{\sum\limits_{i,j}{e_{j}\frac{\partial R_{i}}{\partial e_{j}}}}}$ where R_(i) is a number of bits removed from the single summation array at node i when optimizing node i using a truncation scheme and the error bound e_(i) for the node i, λ_(j) is a maximum absolute value of a coefficient at node j and e_(j) is the error bound for the node j.
 11. The method according to claim 5, wherein the fixed-point polynomial is non-linear with respect to actual errors introduced by each node and wherein computing an error bound for each node in the data-flow graph based on the defined error bound for the polynomial comprises solving, in the synthesizer module, the following equations via Grobner bases: ${\sum\limits_{\alpha}{\lambda_{\alpha}e_{j}\frac{\partial e^{\alpha}}{\partial e_{j}}}} = {T\mspace{14mu}{for}\mspace{14mu}{all}\mspace{14mu} j}$ ${\sum\limits_{\alpha}{\lambda_{\alpha}e^{\alpha}}} = \eta$ wherein T is a constant, λ_(α)=max_(x)|c_(α)(x)| and e^(α) is an error bound term associated with c_(α)(x).
 12. The method according to claim 11, wherein calculating an updated error bound for each node in the data-flow graph comprises updating a value e_(j) at node j to a value f_(j) given by solving, in the synthesizer module, the following equations using Grobner bases: ${f_{j}{\sum\limits_{\alpha}{\lambda_{\alpha}\frac{\partial f^{\alpha}}{\partial f_{j}}}}} = {{Te}_{j}{\sum\limits_{i}{\frac{\partial R_{i}}{\partial e_{j}}\mspace{14mu}{for}\mspace{14mu}{all}\mspace{14mu} j}}}$ ∑_(α)λ_(α)f^(α) = η.
 13. The method according to claim 1, further comprising: generating, in the synthesizer module, a second implementation of the fixed-point polynomial with rational coefficients in hardware logic that satisfies the defined error bounding using the method according to claim 1; and using, in a design assessment module, a heuristic hardware cost function to compare the implementations.
 14. The method according to claim 13, further comprising: in response to the comparison, selecting, in the design assessment module, one of the implementations in hardware logic; and inputting RTL for the selected implementation to a logic synthesizer to produce a gate level description of the implementation of the fixed-point polynomial.
 15. The method according to claim 14, further comprising: fabricating the selected implementation of the fixed-point polynomial in silicon.
 16. The method according to claim 13, wherein the heuristic hardware cost function for an arbitrary binary array, with width w, height h and a number pp of bits in the array is given by: ${{area}\mspace{14mu}\left( {{array}\left( {h,w,{pp}} \right)} \right)} \approx {{7{{pp}\left( {1 - \left( \frac{2}{3} \right)^{f}} \right)}} + {\frac{w}{2}\left( {{3\left\lceil {\log_{2}w} \right\rceil} + 11} \right)}}$ ${{where}\mspace{14mu} f} = {\left\lceil \frac{{\log_{2}h} - 1}{{\log_{2}3} - 1} \right\rceil.}$
 17. An apparatus configured to perform lossy synthesis of a fixed-point polynomial with rational coefficients and generate an implementation of the fixed-point polynomial in RTL, the apparatus comprising: a processor; and a memory comprising computer executable instructions which, when executed, cause the processor to: receive the polynomial as a data-flow graph comprising one or more nodes, wherein each node is either a sum-of-products operator or a constant division operator; compute an error bound for each node in the data-flow graph based on the defined error bound for the polynomial; optimize a single summation array for each node using a truncation scheme and the error bound for each node; calculate an updated error bound for each node in the data-flow graph, wherein the calculating of updated error bounds redistributes error bounds between nodes to allow for synthesis of the fixed-point polynomial with a reduced physical size whilst satisfying the defined error bound; further optimize the single summation array for each node using the truncation scheme and the updated error bound for the node; and generate the implementation of the fixed-point polynomial in hardware logic using the optimized single summation arrays.
 18. An apparatus configured to compare two or more array summations when implemented in hardware logic, the apparatus comprising: a processor; and a memory comprising computer executable instructions which, when executed, cause the processor to: receive RTL data describing the two or more array summations; and calculating, for each array summation, a cost function using: ${{area}\mspace{14mu}\left( {{array}\left( {h,w,{pp}} \right)} \right)} \approx {{7{{pp}\left( {1 - \left( \frac{2}{3} \right)^{f}} \right)}} + {\frac{w}{2}\left( {{2\left\lceil {\log_{2}w} \right\rceil} + 11} \right)}}$ ${{where}\mspace{14mu} f} = \left\lceil \frac{{\log_{2}h} - 1}{{\log_{2}3} - 1} \right\rceil$ where w is a width of the array, h is a height of the array and pp is a number of bits in the array, wherein the cost functions calculated for each array summation provide for several candidate RTL implementations to be compared and an RTL implementation with an optimal cost function to be selected for implementation in hardware; and wherein a hardware logic implementation of the fixed-point polynomial is generated using the selected array summations.
 19. The apparatus according to claim 18, wherein the memory further comprises computer executable instructions which, when executed, cause the processor to: select one of the array summations with a lowest value of the cost function; and input RTL for the selected array summation to a logic synthesizer to produce a gate level description of the array summation. 